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Abstract 

In this paper we describe a HepML format and a corresponding C++ library devel- 
oped for keeping complete description of parton level events in a unified and flexible 
form. HepML tags contain enough information to understand what kind of physics 
the simulated events describe and how the events have been prepared. A HepML 
block can be included into event files in the LHEF format. The structure of the 
HepML block is described by means of several XML Schemas. The Schemas define 
necessary information for the HepML block and how this information should be lo- 
cated within the block. The library libhepml is a C++ library intended for parsing 
and serialization of HepML tags, and representing the HepML block in computer 
memory. The library is an API for external software. For example, Matrix Element 
Monte Carlo event generators can use the library for preparing and writing a header 
of a LHEF file in the form of HepML tags. In turn. Showering and Hadronization 
event generators can parse the HepML header and get the information in the form 
of C++ classes, libhepml can be used in C++, C, and Fortran programs. All nec- 
essary parts of HepML have been prepared and we present the project to the HEP 
community. 
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1 Introduction 



In the last 10-15 years Monte-Carlo simulation programs have become one of 
the main research tools both in phenomenological studies and experimental 
analyses in high energy physics (HEP). This resulted in a burst of new pro- 
grams available for researchers. Since several such programs should be usually 
interfaced to each other in practical calculations, importance of the interfacing 
is rising. Generally, programs can be interfaced either in computer memory, 
via shared libraries or unified executables, or externally, via data files. The 
latter approach is more flexible and simpler in realization, although it can be 
less reliable, since data files can be corrupted, lost, etc. Currently intermediate 
data files, as a means of interfacing programs and a method of data storage, 
are widely spread in HEP. In this paper we propose a new markup language 
for a unified description of Monte-Carlo simulated events at the parton level. 
This format, called High energy physics Markup Language (HepML), is an ex- 
tension for the Les Houches agreements and an attempt to overcome several 
limitations of the Les Houches event format (LHEF) [1]. 

Lets describe the problem we are going to tackle in this paper in a stricter 
manner. In order to simulate collisions of two particles at an accelerator in a 
physical model realistically we have to pass through several steps of simulation 
(see more details in [2]). At first, we generate so called "partonic events" for a 
production process of one or several particles we are interested in. The events 
are points in the phase space, distributed according to the process matrix 
element squared. Examples of programs, which can prepare such events, are 
ALPGEN [3], CompHEP [4], MadGraph [5], HELAC [6], Whizard/OMega [7], 
AMEG1C++ [8], Comix [9], and Grace [10]. However, these events are not 
what we can observe in particle detectors, since most the final particles in 
the events are not real physical degrees of freedom. The final partons should 
form hadrons, which can decay; the final leptons can irradiate photons. This 
phase of simulation is called showering (since the most important effect added 
here is the QCD showers), hadronization, and decays. Currently, the most 
important players at this stage of simulation are Pythia [11], Herwig++ [12], 
and Sherpa [13]. After applying these effects we receive events with observable 
particles in the finale state. However, these events are not what experimen- 
talists in HEP are interested in, since detector effects must be added into the 
events in order to receive realistic simulated detector output. Since the effects 
are detector dependent every HEP experiment develops its own simulation 
software, in most cases not publicly available. Different agreements and file 
formats exist for transferring of simulated data through the whole simula- 
tion chain. As we state above the HepML format is thought to be a part of 
the LHEF format, which is currently a standard format for storing partonic 
event files. In our opinion, the main limitation of LHEF is rigidity of the for- 
mat structure. Certainly it is not a problem to develop a simple and compact 



2 



record for several phase space points - momenta of all particles in an event and 
several additional mimbers, which characterize the event. But LHEF does not 
have any means to keep an information on physical model parameters, applied 
cuts, and other highly important information. The main obstacle for that is 
internal heterogeneity of the information. Thus we are trying to propose a 
rather simple format, which can include blocks of data with various structure. 
Parsers of the format should be relatively simple and be able to parse the 
HepML block if it contains superfluous information. This format should be 
also based on some standard programming tools. 



Therefore our main goal is to overcome rigidity of the LHEF format by means 
of some standard tools. To achieve this we chose the ideology of markup 
languages [14]. Markup languages, strictly speaking procedural markup lan- 
guages, are perfectly suitable for this goal, since the languages are used for 
describing structure of complex data. XML [15], as a standard instrument for 
developing markup languages, is the most appropriate base for such format. 
Thus, HepML is a markup language describing structure of data within the 
LHEF format. Employing the standard software technologies allows users and 
developers to re-use a lot of reliable and well-designed software, developed in 
the industry of software development. Fig. 1 represents a place of the HepML 
language and libraries in the full siniTilation chain in HEP. 



Matrix 
Element 
level 




ME generator 



Fig. 1. Place of HepML in the Monte Carlo simulation chain in HEP. 



XML has been applied in many programming projects in HEP, mainly in 
experimental HEP. In particular, the CMS experiment stores detector geom- 
etry data in XML [16], ATLAS uses XML tools for management of analysis 
software [17], Monte-Carlo simulation program Geant4 [18] stores detector 
geometry by means of a special markup language GDML [19]. XML-based 
formats also prove their usefulness in many other scientific areas. We mention 
several successful examples only: 

• Chemical Markup Language [20] formalizes the structure of information 
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about molecules, chemical reactions, analytical data, etc. It is extensively 
used in computational chemistry, quantum chemistry, material sciences. A 
lot of chemistry oriented software support the standard. 

• MathML [21] aims at providing ability to use mathematical notations in 
HTML documents. The format is supported in all major Web browsers 
and in an enormous number of applications, such as programs for distant 
education, computer algebra systems, formula editors, etc. 

• CellML [22] was originally created for describing mathematical models in 
cardiac research. Now it is used as a description format for computational 
models, in many other areas of biology and other sciences, such as the 
computer science. 

Traditional event file formats, such as standard file headers in CompHEP [4] 
and MadGraph[5], a format of output files of showering/decay Monte Carlo 
generators HepMC [23] , have a fixed structure. This approach has both bene- 
fits and drawbacks. Generally, a format of this type can look simpler and more 
human-readable, than XML code, in ordinary text editors. It also requires 
less programming efforts. However, any small correction of an output/input 
file format of a program forces to modify both the program itself and all pro- 
gramming clients of the program. As we mentioned earlier, current simulation 
chain in HEP is rather long and quite complicated. So, constant modifications 
in some parts of the chain cause work in other parts. More stable file formats, 
such as HepML, will prevent this. 

The next section conveys general conceptions of HcpML. HepML XML Schemas 
are described in the section 3. Since the format itself is a markup language 
only, a software library for interpretation of the language is necessary. The 
section 4 describes a programming API for this format. It is a C-|— I- library, 
called libhepml. The section 5 explains how to use the library. To date we have 
two software projects which already adapt the HepML format and/or the li- 
brary. The section 6 outlines these projects and a role of HepML in them. We 
summarize conclusions and future plans in the section 7. 



2 General description 

A HepML block in a LHEF event files is an XML document with a pre- 
defined structure, i.e. it is a piece of ordinary text marked up with tags. 
The text has to contain an information enough to understand what kind of 
physics the events describe and how the events have been prepared. The only 
place in a LHEF file, which can accommodate the HepML block, is the tag 
<header><\header>, because the LHEF format does not define contents of 
the tag. Since a LHEF file is not an XML document, the file with a HepML 
block can not be an XML document too. We intentionally do not propose 
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to store event records in the XML style, since this will add repeated tags 
in the records and makes the event file larger. Since a typical event file can 
be huge with hundreds of thousands of event records, these new tags in the 
event records would be too expensive for manipulating with the files. On the 
other hand, extracting contents of the tag <header><\header> is quite simple 
programming task. 

Tags in an HepML block are arranged in a determined structure. Several XML 
Schemas [24] define the pre-defined structure. Tags used in the block can be 
either standard ones, i.e defined in the XML Schemas, or user-defined tags. 
The latter tags must obey XML rules only. The HepML XML Schemas define 
data types and relative disposition of all tags in an XML document. The main 
principle we keep in the setup can be formulated in the following manner: "do 
not remove or change the standard tags, add your own tags" . This means we 
assume users will use existing tags from the HepML Schemas, will not change 
the tags, and will introduce any new tags if and only if the existing are not 
appropriate for the users' goals. These new tags can be organized into new 
XML Schemas. The HepML XML Schemas validate a HepML block, i.e. a 
validator should conclude whether this set of tags is a HepML document or 
not. If, we get a positive answer, an XML parser can process the document 
automatically. It is essential to understand the HepML Schemas validate the 
standard tags only. If a user adds new tags, (s)he should develop a code in 
order to extract and use an information from these new tags in a program. 
Another obvious condition for any new tags - a HepML block with the new 
tags must be an XML document. Meaning of all standard tags is rather clear 
and can be derived from names of the tags (see Appendix of the paper). The 
next section describes the HepML XML Schemas in detail. 

The second necessary part of HepML is an application programming inter- 
face (API) . The main goal of the API is parsing a HepML block in an event 
file. Currently our programming interface is realized as a C-|— I- library, called 
libhepml. The library consists of object classes, which correspond to com- 
plex structures in the HepML XML Schemas, parsing classes, and serializa- 
tion classes. All information read from a HepML block is stored in the object 
classes of the API. Thus, a user should create instances of the classes in his/her 
program and extract needed information from the classes. Sections 4 and 5 
describe libhepml in detail. 

HepML can be useful in applications of several types. Further we follow ter- 
minology of [25] . Any ME (Matrix Element) Monte Carlo event generator can 
generate a HepML block in an output event file, if the program supports the 
Les Houches event format. A SH (Showering and Hadronization) Monte Carlo 
event generator can use data from the block for further processing events 
from the file. An information stored in a HepML block is useful for describ- 
ing event files in data bases of the files. As an example of the DB we shall 
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consider MCDB [26] . The last class of applications can be called event manip- 
ulators. Usually it is not a stand-alone program, but a part of a large Monte 
Carlo program package or an experimental software environment. The pro- 
gram processes an event file ( or several files) and generates a new event file. 
For example, a manipulator can apply a new kinematical cut and store all 
events passed the cut in a new event file. Since all applied cuts are stored in 
HepML we should modify the flepML block and add an information about 
this new cut in the output event file. Another manipulator mixes several event 
files into one. In this case we have to check whether we can mix events from 
the files at all. If yes, the program combines HepML blocks from the files 
in one HepML block in the output event file. Parsing the HepML blocks in 
the input event files is the key problem in manipulators. Fig. 2 represents an 
interaction between programs of different types via HepML blocks and API 
routines needed for that. The same manipulator can be both producer and 
consumer of a HepML information. 




HepML producers 



ME generator 



Manipulator 



HepML data 
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Fig. 2. A simple scheme of interaction of programs via HepML data blocks. 



The current HepML hbrary depends on either Xerces [27] or Expat [28] . Many 

developers try to exclude external dependencies as much as possible. In fact, 
libhepml is a necessary tool for parsing only. If a program writes a HepML block 
down into an event file, the dependency on xml libraries can be excluded. In 
this case the program has to have internal routines for generation of XML 
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tags. Certainly, if the structure of the HepML block changes, the code should 
be modified according to the changes. 



3 HepML XML Schemas 

The HepML XML Schemas provide a general and formal description of an in- 
formation about events which should be kept in HepML files. In other words, 
the Schemas represent a formal agreement how to represent and handle such 
descriptions. Besides the Schemas can be used for automated generation of 
a program code to handle HepML documents. For example, see the descrip- 
tion of libhepml in the next chapter. Though a natural way to operate with 
HepML documents is to use lihhepmJ,. having the standard and extensible XML 
Schemas allows users to make their own implementation of specific parsers, 
output routines and validators. 

Authors of MC codes can use the XML Schemas in developing of I/O routines. 
If a routine is consistent with the Schemas, event files generated by the routine 
can be read by another program without changes in input routines of the 
program. Also the Schemas can be used for validation of event files if the files 
are written according to HepML specifications. 

One of the main goals of HepML is to store all significant information on 
simulated events as well as generator input parameters and setup in XML 
documents. The first task, a detailed description of events, is done by the LCG 
group, the second task is carried out by CEDAR [29] . Since these problems are 
complement to each other, Schemas, developed by both groups, are united in 
the main XML Schema of the HepML language hepml.xsd. Here and below we 
consider the LCG Schemas only. Lets mention the main parameters described 
in a typical HepML document. It should contain a general information about 
what kind of problem the events have been prepared for, a generator name, 
description of the physical process, description of the physical model in use, 
applied cuts, a general information on the event file: 

• General information: title, abstract, authors, experiment and/or groups. 

• Event files: physical pro cess/subpro cesses, the number of events, the total 
cross sections and its errors, file name, location(s). 

• Physics process: initial and final states, QCD scales, parton density func- 
tions (PDFs) applied, subprocesses. 

• Used generator: name and version, description, home page. 

• Theoretical model: name, description, parameters and their values with 
author's description. 
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• Applied cuta ^ I: cut objects, minimal and maximal possible values, and 
description. 

There are three main Schemas in the LCG part of HepML. The first XML 
Schema lhal.xsd corresponds to the whole set of parameters composing the 
first LHA agreement [25]. The other two Schemas are sample- description. xsd 
and mcdb- article. xsd. The first one describes parameters which are necessary 
to generate an XML data for an event sample. The second Schema determines 
representation of data from LCG MCDB, it is used to form an LCG MCDB 
article for the sample. The CEDAR team develops other XML Schemas for 
other tasks arising in the problem of automatisation of data processing in 
HEP. Now all the Schemas are unified in one general formal XML Schema 
hepml.xsd. The Schema includes all the other Schemas as sub-Schemas. This 
solution leaves freedom to develop new Schemas and software independently, 
but to use Schemas of both groups in one software project. All the Schemas 
are available in [30]. 



4 Code Structure Description 

In this section we present a library implementing the HepML standard. The 
library is called libhepml. Classes of this library can parse HepML blocks in 
LHEF files, represent data from the blocks in computer memory, combine sev- 
eral HepML blocks in one, and serialize C++ objects into HepML documents 
in files. Libhepml consists of several types of C++ classes: 

• object classes represent complex types from the HepML XML Schemas. 
The main object class is "Article" . This class describes a set of event files. 
All other object classes are intended for implementation of some pieces 
of information in 'Article", such as a physical model, a cut, beams, etc. 
"Article" includes these classes as internal objects. 

• "Acting" classes. "Parser" is the main parsing class. All complexity of 
parsing of XML tags and translating data to object classes is hidden in 
methods of the class. "Mixer" is the main class for merging HepML objects. 
"Writer" is the interface class for serialization of HepML documents. 

• XML generating classes serialize contents of an object class to XML 
tags. 

• Parsing classes parse HepML tags. These internal classes are not intended 
for direct use by end-users of the library. 



Since an event file can contain events for several independent subprocesses, cuts 
in the HepML record are subprocess specific, i.e. there can be several sets of cuts 
for every subprocess kept in the files. 
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• Implementing classes translate data from parsing classes to object classes. 

Every parsing class has its own implementing class. 

• Interacting classes interact with classes of external libraries, currently 
with classes of Xerces or Expat. 

Three first types of classes reahze a user API of the hbrary. The last three types 
of classes fulfil basic functionality in the library. All these basic internal classes 
have been prepared partly by means of XSD [31] This software prepares a set 
of classes for parsing XML documents according to an XML Schema. In our 
case, the parsing and implementing classes realize the HepML XML Schemas. 

Strictly speaking the "Article" class is a container for all information from a 
HepML block. The class interface looks like: 

class Article { 
public : 

ArticleO ; 

virtual "ArticleO; 

int& idO ; 

int& id(int id) ; 

stringfe title ; 

stringfe title (const stringfe) ; 

stringfe abstract 0; 

string&i abstract (const stringfe) ; 

string& comments () ; 

stringfe comments (const stringfe) ; 

ExperimentGroupfe experimentGroupO ; 

ExperimentGroupfe experimentOroup (const ExperimentGroupfe); 
vector<Author>& authors (); 

vector<Author>& authors (const vector<Author>&) ; 

const string postDateO; 

Processfe process (); 

Processfe process (const Processfe); 

Generatorfe generator () ; 

Generatorfe generator (const Generatorfe) ; 

Modelfe model ; 

Modelfe model (const Modelfe); 

CutsVectorfe cutsO; 

CutsVectorfe cuts (const CutsVectorfe); 

vector<File>& filesO; 

vector<File>& files (const vector<File>fe) ; 
vector<string>& relatedPapers () ; 

vector<string>fe relatedPapers (const vector<string>fe) ; 

} 
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We can see the class has paused methods with the same names for getting/ setting 
parameters. Information stored in the parameters correspond to information 
stored in tags of the HepML tag <sainples></samples> (see Appendix to the 
paper). For example, the class Generator stores an information (name, version, 
description, homepage) for a Monte Carlo generator which has been used for 
simulation of events. The method "abstract()" returns a text of an abstract 
of the HepML document. 

The "acting" API classes are "Writer", "Parser", "Mixer". The first class 
generates HepML documents from an "Article" object. The second class is 
responsible for parsing one or several HepML blocks, e.g. ones stored in LHEF 
event files. It fills out an "Article" object. The "Mixer" class merges several 
HepML blocks into a new block. "Mixer" follows the algorithm: 

• Sum cross sections of all event samples and combine their errors; 

• Concatenate string parameters, such as model and generator descriptions, 
authors, etc. The parameters for different HepML blocks are separated with 
semicolons; 

• Compare beams. They must be the same in all HepML blocks. Otherwise 
"Mixer" aborts execution and returns an error; 

• Subprocesses from all HepML blocks are combined in one array; 

• Combine cut sets. A cut set is added to the array of cut sets if the array 
does not have it yet; 

• Combining physical models is more complicated problem. If a model param- 
eter is not found in the combined model it is appended to the array of model 
parameters. If it is found, but it has a different value, it is added only if the 
special fiag "hepml::force::merge" is assigned in "Mixer::mixObjects(...)" . 
Otherwise the library aborts execution with error. 

Combination of several subprocesses from several event files into one array of 
subprocesses has one subtle point. Different cuts can be applied in these sub- 
processes. Therefore, we have to unify events into one event sample and keep 
these cuts separately. In order to solve the problem we apply the following al- 
gorithm. Every subprocess is assigned an id number. This number is kept as an 
attribute in the tag <subprocess>, for example: <subprocess cutset_id="2" >. 
All cuts applied for events of the subprocess are unified inside the tag <cutSet> 
with the same attribute, for example: <cutSet cutset_id="2">. All <cutSet> 
are combined in an array and located inside the tag <cuts>. Thus, the final 
HepML document will have two arrays of subprocesses and corresponding cut 
sets. 

The last part of the API is generating classes. They serialize C++ objects, 
corresponding to object classes, to XML code, i.e. they produce HepML doc- 
uments. Every object class has its own generating class. The set of XML tags 
generated for the "Article" class can be used both as an XML independent 
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document or as a block of information, which can be included to files in other 
formats, e.g. event files in the LHEF format produced with a Monte Carlo 
generator. 

Figure 3 represents a diagram of typical interaction between a parsing class 
and an object class for filling out a HepML object. We use "model parameters" 
as an example. For simplicity we omit most of methods in the classes and keep 
methods needed only for the purpose of the picture. At fist, low-level methods 
of interfacing classes parse the whole HepML block and extract one by one 
tags corresponding to complex types in the HepML Schemas. Libhepml has a 
parsing class for every such tag. Classes of this type has a postfix "_pskel". 
The parsing class analyses the tag name and assigns the tag content to a cache 
variable. The variable can be of either a standard type (int, string, etc.) or 
another complex type (any HepML tag nested in the tag). These operations are 
carried out by the method "start_element_impl" . "start_element_impl" verifies 
also whether the tag name belongs to the hepml namespace (a C++ realization 
of the vocabulary of HepML tags). After that the method "end_element_impl" 
assigns the cache variable to a corresponding internal variable in "Article" or 
another HepML class, via methods we call assigning methods. However, these 
assigning methods are not implemented in the parsing class. Realisation of the 
methods is the main goal of the implementing classes. Every implementing 
class inherits to a corresponding parsing class. The implementing classes are 
instantiated by the "hepml:: Parser" class, the main "acting" parsing class in 
the APL Since a typical HepML block contains lots of tags, all corresponding 
parsing objects have to be combined. It happens via callback methods called 
"parsers(...)" . Every parsing class has its own "parsers" method. In terms of 
the HepML schemas the method defines tags which have to exist inside the 
tag, the parsing class corresponds to. 

All classes of libhepml can deal with tags of the HepML Schemas only. How- 
ever, there can be necessity to introduce new user-defined tags in HepML 
blocks. If one needs a simple tag without any nested tags, a user can mod- 
ify existing parsing/implementing/object classes for a complex type in the 
HepML Schemas. If a more comphcated set of tags is necessary, a new object 
class should be created. There is an example of such a set of classes in the 
library. 



5 How to use libhepml 

Libhepml can perform tasks of three types: creating an article object in a 
program and send the object to an output stream, parse an HepML document 
in a file and keep information from the block in the article object, and merge 
several HepML documents into one. Below we present simplified examples for 
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Fig. 3. The interaction of parsing and implementing classes. Solid lines represent 
transfer of data. Dashed lines represents dependence of "end_element_impl" on as- 
signing methods. See details in text. 

each of these tasks. 

In order to use libhepml, three C++ headers should be added to code: 

#include <hepnil/hepnil .hpp> // general object classes 
#include <hepnil/writer .hpp> //to produce HepML documents 
#include <hepnil/parser .hpp> //to parse and merge HepML documents 

Note: if one of two last headers are included, there is no need in the first one. 

At first an instance of the main object class "Article" should be createcOJ: 

Article a; 

Lets fill out the object with information. There is two different ways how to 
assign values to parameters of the object. For example: 

a.title("p,p->Wbbj->l,nu,b,b, j process from CompHEP"); 



a.abstractO = "There are about 1 . IM events for the process 



^ We assume the hepml namespace is defined in the source file via the standard 
C++ instruction "using namespace hepml;" . So, all libhepml classes are used without 
the namespace prefix "hepml::" 



or 



p,p->W,b,b,j with leptonic decays of W-boson. 



II 
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The object classes have several arrays for authors, model parameters, sub- 
processes, and cuts. Lets create an author record and add it to the article 
object: 

Author author; 

author. firstNameO = "James"; 

author . lastNameO = "Johnson"; 

author. email = "James . JohnsonOnospamcern. ch" ; 

author . experiment () = "CMS"; 

author. experimentGroupO = "CMS Top group"; 

author . organizationO = "CERN"; 

a. authors .push_back (author); 

After that we should introduce and describe a physical process in the article. 
At first, we describe initial beams: 

Process p; 

p. beamlO .particle = Beam: : Particle ("p") ; 

p.beamlO .energy = Beam: : Energy (7000, Beam: : Energy : :GeV) ; 

p.beamlO .pdf .name = "CTEQ6L"; 

p.beamlO .pdf.lhaPdf Set = 3; 

qc = QcdCouplingO ; 

qc . lambda = 2.0; 

qc. nLoopsAlpha = 2; 

qc.nFlavours = 5; 

p.beamlO .qcdCoupling = qc; 

The second beam is set in the same manner (certainly, beam2() should be used 
instead of beaml()). We have to describe the final state, assign information 
on a cross section, and add a description of the process to our article: 

p.finalStateO = "l,nu,b,b, j " ; 
p.finalStateNotationO .plain = "1 ,nu,b,b, j " ; 

p.f inalStateNotationO .html = "<i>l,nu,b,b, j</i>" ; 
CrossSection cs = CrossSection( 22.78, CrossSection: :pb ); 
cs.errorPlus = cs . errorMinus = 0.02; 
p. crossSectionO = cs; 
a.processO = p; 

If we have several subprocesses in the process, we add them to the subprocesses 
array in the article object: 

Subprocess sp; 

sp.notationO = "u,D -> nm,M,G,b,B"; 

sp. crossSectionO .value = 1.3221; 

sp. crossSectionO .unit = CrossSection: :pb; 
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sp.crossSectionO .errorPlus = 

sp . crossSectionO . errorMinus = 2.68e-03; 

a.processO . subprocessesO .push_back( sp ); 

We should keep information on a generator and append the information to 
the article: 

gen. name = "CompHEP"; 
gen. versionO = "4.5.2"; 

gen. homepage () = "http://comphep.sinp.msu.ru"; 
gen.descriptionO = "Funny Monte Carlo event generator"; 
a.generatorO = gen; 

The next important step is an introduction of a physical model: 

Modelfe m; 

m.nameO = "Standard Model"; 

m.descriptionO = "There can be a long and detailed 

description of the model"; 

Model parameters should be added one by one: 

Model :: Parameter param = Model: : Parameter ( "Ms", "0.117"); 

param.mathNotationO .plain = "Ms"; 
param. mathNotationO .html = "m<sub>s</sub>" ; 
param. descriptionO = "parameterl description"; 
m.parametersO .push_back (param) ; 

If any kinematical cuts have been applied we can add description of the cuts. 

Cut cut; 

cut. object = "M(l,nu)"; 
cut.minValueO = "100 GeV" ; 
cut .maxValueO = ""; 

cut. objectNotationO .html = "html object notation"; 
a. cuts .push_back(cut) ; 

After that we assign an information about the event file: 

File f; 

f .event sNumberO = 195644; 
f .sizeO = 407736817; 

f . crossSectionO = CrossSection(22.78, CrossSection: :pb) ; 
f . checksumO .type = File: : Checksum : :mda5; 
f .checksum .value = "8957a237dc062b96987a21c86774eb5e" ; 
a.f ilesO .push_back(f ) ; 
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As soon as the article is created and filled out with necessary information, 
it can be serialized to an output stream, for example to the standard output 
stream: 

hepml :: Writer writer; 

std::cout « writer .toHepmlC a ); 

In this case the article will be printed out in the form of a HepML block. 
If a parameter in the "Article" object is necessary (according to the HepML 
Schemas) and it is undefined, it will be assigned to a reasonable value, an 
empty string for a string parameter, for mt, etc. If the parameter default 
value is specified in the HepML Schemas, the value will be used. 

The second typical task performed by libhepml is HepML parsing. In order to 
parse a HepML block in a file and fill out an instance of the "article" class 
we should define instances of the "Article" and "Parser" classes and parse the 
LHEF files by means of the parseObject object: 

Article a; 
Parser parser; 

:: std: : string file = "hepinl_examples/general/examplel .xml" ; 
parser .parseObject (a, file); 

After that, all information from the HepML block is available in the object 
"a" . We can manipulate the information. For example: 

cout « "Generator name: " « a.generatorO .nameO « endl; 
cout « "Model: « a.modelO .nameO « endl; 

cout << " with parameters:" 

for (int i(0); i < a. model () .parameters (). size () ; ++i) { 
cout << " name: " « a.modelO .parametersO [i] .nameO « 
", value: " « a.modelO .parametersO [i] .valueO « endl; 

} 

If we have several LHEF files and want to merge HepML blocks in the files, we 
can use the "Mixer" class and the mixObjects method; the second argument 
should be a vector of file names. 

Article article; 
Mixer mixer; 

vector< : :std: :string> files (2) ; 
files [0] = "hepml_examples/general/examplel .xml" ; 
files [1] = "hepml_examples/general/example2.xml" ; 
try { 

mixer .mixObjectsCarticle, files) ; 

} 
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catch (const :: std: : exception m) { 
m . what ( ) ; 
return 1 ; } 

The merging algorithm was explained in detail in the previous section. The 

only subtle point here is merge of models with a parameter, which has different 
values in different HepML block;s. In order to merge the HepML blocks in this 
case we should use a special flag: 

try { 

mixer. mixObjects (article, files, f orce :: merge) ; 

} 

catch (const :: std: : exception m) { 
m . what ( ) ; 
return 1 ; } 



6 Current HepML applications 

6.1 LCG MCDB 

In the last years there was a need for common place to store sophisticated 
Monte Carlo event samples prepared by experienced theorists. Also such sam- 
ples should be accessible in a standardized manner to be easily imported and 
used in experiments' software environments. The main motivation behind the 
LCG MCDB project [26,33] is to make the sophisticated Monte Carlo event 
samples and their structured descriptions available for various groups of physi- 
cists working at the LHC. All these data from MCDB are accessible for end- 
users in several convenient ways from Grid, on the Web, and via an application 
program interface. 

The main content of MCDB are event files and their detailed descriptions. 
These descriptions are fully compatible with the information which can be 
provided in HepML blocks. So, the main way to automate an access to MCDB 
is to use HepML documents in interaction with MCDB. An event sample de- 
scription can be both exported from MCDB or uploaded to the data base. In 
other words, an MCDB article can be obtained as a HepML document. Oth- 
erwise, a new article can be created automatically in MCDB using a HepML 
description of an event file. 

The CMS collaboration [34] already uses MCDB in its productions of simu- 
lated events. Event files can be downloaded from MCDB by means of internal 
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Fig. 4. A simple scheme of interaction of MCDB and CMSSW in Monte Carlo 
productions. 

routines of the CMSSW software simulation environment. The routines are 
based on classes of libhepml. There is also an option to upload new event files 
to MCDB using a special uploading script. The script supports LHEF files 
with two types of header blocks, HepML and MadGraph. 



6.2 CompHEP Monte-Carlo generator 



The Monte Carlo event generator CompHEP produces events on the partonic 
level for particle decays and particle collisions at colliders. The main file format 
for the event files in CompHEP is LHEF, although two obsolete native event 
file formats are still in use for backward compatibility. More information about 
the program can be found in [4,35]. 

Since CompHEP generates partonic level events it is a natural target for im- 
plementing HepML. Currently, by default, CompHEP needs neither an xml 
library no libhepml in order to produce a HepML block, since the task of 
generating XML code is rather simple programming problem, and the output 
format is rather stable. The block is constructed with internal CompHEP rou- 
tines. However, CompHEP can be compiled with the hbxml2 hbrary. It is a 
standard GNOME XML library. 

There are several problems where CompHEP HepML block should be parsed 
and modified. It means the libxml2 parser should be used. CompHEP gen- 
erates events per subprocess, i.e. a physical process with fundamental model 
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particles in the initial and final states. However, in many cases we need to 
sum over several initial or final states. For example, this happens if we have 
a composite particle, like proton, in the initial state. Therefore we come up 
with several event files, which should be combined into one event sample. This 
problem is a particular case of the merging problem we discussed in Sect. 4. 
HepML blocks of subprocess event files should be merged into one block. There 
is a special mixing program, called "mix" , for that in CompHEP. If "mix" is 
linked against libxml2 it combines all HepML blocks from input files and adds 
a new block to the final event file. 

Since two authors of the paper are also members of the CompHEP collabo- 
ration, we plan to expand usage of HepML in CompHEP. Namely, we plan 
to add support of HepML to the "addcut" and "cascade" programs. The for- 
mer program applies a new cut to events kept in a CompHEP event file. So 
we have to add the cut to the HepML block in the output event file. The 
letter program replaces heavy decaying resonances in final states with their 
decay products. Therefore the output event file contains subprocesses with 
different final states. A list of these new subprocesses should replace the old 
list of subprocesses. In the future, we plan to use libhepml instead of hbxml2 
in CompHEP, since this will help significantly to simplify codes of programs, 
where HepML is used. 



7 Conclusion and Plans 

In the article we present HepML, a new markup language for describing events 
on the partonic level in a uniform and fiexible manner. Blocks of HepML tags 
can be painlessly implemented into LHEF files. HepML allows Monte Carlo 
event generators to prepare self-documented event files. It means we have all 
necessary information about events, such as physical model, applied cuts, etc., 
inside of the event file. HepML is constructed as an extensible language. Users 
can extend the set of XML schemes or add new tags into the format. The 
only requirement to these new tags is that they should follow XML rules. The 
structure of a HepML block is defined with several HepML XML Schemas. 

HepML is equipped with an API in the form of a C++ library. It is called 
libhepml. The library consists of object classes for representing information 
from a HepML block in computer memory, parsing classes, and classes for 
seriahzation the object classes into HepML tags. This hbrary can use either 
Xerces or Expat XML parser for low-level parsing of XML tags. There is a 
possibility to add processing of new user-defined tags into the library. Libhepml 
provides a unified interface for the automatic event description at different 
levels of Monte-Carlo simulation in HEP. 



18 



The developed HepML schemes, documentation and code of the libhepml h- 
brary are available publicly on the MCDB web server [33]. 

There are several projects which have already started to exploit HepML. LCG 
MCDB uses HepML information in all its external interfaces. The CompHEP 
Monte Carlo event generator adds HepML blocks into event files and uses the 
blocks for mixing of several event files into one event file. HepML has been 
implemented in the software environment of the CMS collaboration (CMSSW) 
in order to document externally simulated event samples, kept in LCG MCDB. 

We are going to develop the project further in the framework of an open 
source project [36] and encourage people interested in development of XML- 
based formats to join the project. 
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9 Appendix. HepML tags. 

The main LCG HepML Schema consists of a number of tags. Here we collect 
information about the tags: 

• <sainples> is the root tag of a HepML block. It contains information on 
each event file (in the <f iles> tag) and a common description of events in 
the files (in the <description> tag). 

• <description> describes an article. It should have several simple tag 
(<title>, <abstract>, <authorComments>, <relatedPapers>) and sev- 
eral tags with nested tags (<experimentGroup>, <generator>, <model>, 
<process>, <cuts>, <authors>). 

• <authors> contains a list of all authors of the event sample. 

• <author> describes an author, i.e. contains tags for first and last author's 
names, his/her email and affiliation (group, experiment, organization). 

• <title> contains a title of the article. 

• <abstract> contains an abstract of the article. 

• <relatedPapers> contains a list of related articles. 

^ A simple tag contains a text only and does not have any nested tags. 
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<authorCoinments> contains additional authors' comments to the article. 
All information not marked up with tags should be stored within the tag. 
<experinientGroup> contains an information about an experiment and/or a 
group, which produce the events. It consists of several simple tags (<experiinent>, 
<group>, <responsiblePerson>, and <description>). 
<generator> contains an information on a Monte Carlo event generator 
used. There are several simple tags in the tag: <name>, <version>, <homepage>, 
and <description>. 

<model> describes the physical model for the events, i.e. all parameters in 
the model. It has <name> and <description> tags, and an array of param- 
eters within <parameters> tag. Each <parameter> describes an element of 
the model by means of four tags: <name>, <value>, <description>, and 
<notation>. 

<process> describes a physical process. It contains several tags: 

o <beainl>,<beam2> describe initial beams. It means each beam tag defines 
the particle info (tag <particle>), energy (tag <energy>), structure func- 
tions (tag <pdf >), and an information on the QCD coupling related to the 
structure function (tag <QCDCoupling>). 

o <QCDCoupling> defines an information on the QCD coupling if it is not 
defined in beams. 

o <f inalState> defines a final state for the process. 

o <crossSection> sets the total cross section of the process. 

o <subprocesses> defines a list of subprocesses for this physical process. 
Each subprocess has the following characteristics: notations for initial and 
final states, the total cross section, and factorization and renormalization 
scales. 

<cuts> contains a list of applied cuts. These cuts are grouped in several cut 
sets (tag <cutSet>). <cutSet> has a special attribute (cutset_id), which 
couples a <cutSet> tag and a <subprocess> tag. It means if the tags have 
the same value of the attribute, cuts from the <cutSet> tag are applied for 
events of the subprocess. 
<cut> defines a cut. 

<files> is a list of event files. This element can be both used separately 
and inside the <samples> tag. 

<f ile> contains an information about one event file. A type of the file can 

be specified in the attribute type. It has the following nested tags: 

o <eventsNuinber> - the number of events in the file. 

o <crossSection> - the total cross section for events in the file. 

o <f ileSize> - the file size in bytes. 

o <checksuin> - a check sum of the file (a type of the check sum is specified 

in the type attribute), 
o <coinments> - an author's comment for the file. 

o <location> - a list of locations of the file (CASTOR, Grid or others). 
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